Comparing Testing approaches under Non-Proportional Hazards.
1. Weighted Log-rank test(WLRT).
- Peto-Peto .
- Modified Peto-Peto.
- Tarone-ware.
- Gehan-Breslow/ Wilcoxon.
- Fleming-Harrington.
2. Modestly Weighted Log-rank test.
- Unstratified
- Stratified
3. Max Combo test.
- Unstratified
- Stratified
Note on the Use of Custom R Functions:
The survminer package can be used to perform the Peto–Peto, modified Peto–Peto, Tarone–Ware, and Gehan–Breslow/Wilcoxon tests. However, its functionality is limited to generating p-values and does not provide the corresponding test statistics. The custom R function from Kassambara’s GitHub repository can be used to perform these tests. This function internally implements weighted log-rank tests and supports the log-rank, Gehan–Breslow, Tarone–Ware, Peto–Peto, modified Peto–Peto, and Fleming–Harrington tests. The custom function reproduces functionality previously available through survMisc::comp(), which is no longer available on CRAN. The custom functions used to perform each test, as presented in the R document, were derived from this GitHub-based implementation. Only the sections relevant to pairwise comparisons were retained, since the original custom function also supports k-group comparisons.
| Analysis | Supported in R | Supported in SAS | Match | Notes |
| WLRT- Peto-Peto | Yes | Yes | Yes | In R, the survminer::surv_pvalue(method = "S1") function computes the p-value for the Peto–Peto test. However, the survminer package does not provide a function for generating the corresponding test statistics. The custom R function can be used to perform this test and obtain both the p-value and chi-square statistic. The resulting values are comparable to those produced by SAS but not consistent with results from the coin::logrank_test() and survival::survdiff() functions. In SAS, this test is implemented using the LIFETEST procedure with a STRATA statement and the TEST=peto option. |
| WLRT- Modified Peto-Peto | Yes | Yes | Yes | The survminer::surv_pvalue(method = "S2") function in R generates the p-value for the modified Peto–Peto test. The custom R function can be used to obtain both the chi-square statistic and the p-value. In SAS, this test is performed using the LIFETEST procedure with a STRATA statement and the TEST=modpeto option. |
| WLRT-Tarone-Ware | Yes | Yes | Yes | The coin::logrank_test() function in R performs the Tarone–Ware test when the argument type = "Tarone-Ware" is specified. The survminer::surv_pvalue(method = "sqrtN") function computes the p-value for this test. However, because the survminer package does not provide a built-in function to return the test statistic, the custom R function can be used to compute the chi-square statistic along with a p-value consistent with the method = "sqrtN" implementation. The results obtained from the custom function and the survminer package agree with those produced in SAS using TEST=TaroneWare with the STRATA statement in PROC LIFETEST. In contrast, the results from coin::logrank_test() do not match the SAS output. The survminer::surv_pvalue() function computes p-values from survfit objects by comparing survival curves. Its default method is survdiff, which performs the standard log-rank test. |
| WLRTGehan Breslow | Yes | Yes | Yes | In R, survminer::surv_pvalue(method = "n") together with the custom R function yields the p-value for the Gehan–Breslow test, corresponding to its canonical weighting scheme. The coin::logrank_test() function performs this test when the argument type = "Gehan-Breslow" is specified. In SAS, the equivalent test is obtained by specifying TEST=Wilcoxon in the PROC LIFETEST procedure. However, the results produced by coin::logrank_test() are not consistent with the SAS output. |
| WLRT-Fleming- Harrington | Yes | Yes | Yes | This test is computed in R using nphRCT::wlrt() and coin::logrank_test functions. In SAS, you have to specify test=FH(\(\rho,\gamma\)) using the LIFETEST procedure. The results produced by nphRCT::wlrt() and the LIFETEST procedure are consistent. |
| Max Combo | Yes | Yes | Yes | In R, the test can be implemented using A stratified max-combo test can be performed in both R and SAS. Additionally, the choice of weighting can be modified, as demonstrated in the SAS file. |
| Modestly Weighted Log-rank | Yes | No | No | In R, this test can be performed using the nphRCT::wlrt() function and specifying either the t* or s* parameter. Here, s* represents the fixed survival probability threshold, whereas t* denotes the time point at which the pooled survival probability reaches s* (see the referenced documentation for the definition of this test’s weight function). A stratified version of the test can be implemented by incorporating the strata() function. This approach provides both the individual test statistics for each stratum and the combined test statistic. To the knowledge of the CAMIS contributors, there is no direct implementation of this test in the SAS LIFETEST procedure. |
Comparison Results.
\[H_0 : S_1(t)=S_2(t) \mbox{ }\forall t \mbox{ v/s } H_1 : S_1(t) \neq S_2(t) \mbox{ for some t. }\]
Note: coin::logrank_test() - Default distribution is asymptotic. Generates \(Z\) test statistic. \(Z^2 = \chi^2_{(1)}\)
| Test | Statistic | Function in R | R Result | Function in SAS | SAS Result | Match | Notes |
|---|---|---|---|---|---|---|---|
| WLRT- Peto-Peto | Chi-square |
|
9.8238 Z=3.0423 9.9000 |
PROC LIFETEST with STRATA group /test=peto) |
9.8238 | Yes No No |
survminer::surv_pvalue() prints the p-value; The custom R function generates both the chi-square statistics and the corresponding p-value. |
| P-Value |
|
0.0017 0.0017 0.0023 0.0020 |
PROC LIFETEST with STRATA group /test=peto |
0.0017 | Yes Yes No No No |
||
| WLRT- Modified Peto | Chi-square | Custom R function |
9.7491 Z= 3.0276 |
PROC LIFETEST with STRATA group /test=modpeto |
9.7491 | Yes No |
|
| P-Value | |
|
0.0018 0.0018 0.002465 |
PROC LIFETEST with STRATA group /test=modpeto |
0.0018 | Yes Yes No |
||
| WLRT- Tarone-Ware | Chi-square |
|
9.4230 Z=2.9636 |
PROC LIFETEST with STRATA group /test=taroneware |
9.4230 | Yes No |
|
| P-Value | |
|
0.0021 0.0021 0.0030 |
PROC LIFETEST with STRATA group /test=taroneware |
0.0021 | Yes Yes No |
||
| WLRT- Gehan-Breslow/ Wilcoxon | Chi-square |
|
8.2593 Z=2.7863 |
PROC LIFETEST with STRATA group/test=wilcoxon |
8.2593 | Yes No |
|
| P-Value | |
|
0.0041 0.0041 0.0053 |
PROC LIFETEST with STRATA group/test=wilcoxon |
0.0041 | Yes Yes No |
||
| Fleming- Harrington | Chi-square | nphRCT::wlrt() |
FH(0.5,0.5)=10.3122 FH(1,1)=9.8019 FH(0,1)=9.5455 FH(0.5,2)=8.32428 FH(1,0)=9.9 |
PROC LIFETEST with STRATA group/test=FH() |
FH(0.5,0.5)=10.3122 FH(1,1)=9.8019 FH(0,1)=9.5455 FH(0.5,2)=8.32428 FH(1,0)=9.9 |
Yes | |
coin::logrank_test() |
FH(0.5,0.5) :Z=3.0582 FH(1,1): Z=2.9720 FH(0,1): Z=2.9256 FH(0.5,2): Z=2.7163 FH(1,0): Z=3.0423 |
No | |||||
| P-Value | nphRCT::wlrt() |
FH(0.5,0.5)=0.0013 FH(1,1)=0.0017 FH(0,1)=0.0020 FH(0.5,2)=0.0041 FH(1,0)=0.0017 |
PROC LIFETEST with STRATA group/test=FH() |
FH(0.5,0.5)=0.0013 FH(1,1)=0.0017 FH(0,1)=0.0020 FH(0.5,2)=0.0041 FH(1,0)=0.0017 |
Yes | ||
coin::logrank_test() |
FH(0.5,0.5)=0.0022 FH(1,1)=0.0030 FH(0,1)=0.0034 FH(0.5,2)=0.0066 FH(1,0)=0.0023 |
No | |||||
| Modestly Weighted Log-rank(unstratified) | Chi-square | nphRCT::wlrt() |
11.2786 | No direct implementation |
Null | No | This function can perform types of modestly-weighted log-rank tests and the Fleming-Harrington(\(\rho,\gamma\)) test, in addition to the standard log-rank test. |
| P-Value | nphRCT::wlrt() |
0.0008 | No direct implementation |
Null | No | ||
| stratified Modestly Weighted Logrank_test | Chi-square | nphRCT::wlrt() |
strata1=7.5185 strata2=3.7418 Combined=10.8359 |
No direct implementation |
Null | No | |
| P-Value | nphRCT::wlrt() |
strata1=0.0061 strata2=0.0531 Combined=0.0010 |
No direct implementation |
Null | No | ||
| Unstratified -Max Combo | Chi-square | nph::logrank.maxtest() |
3.30 (Z test) | SAS macro |
3.30152 | Yes | In R, it defaults to two sided test, unless specified otherwise. |
| P-Value | nph::logrank.maxtest() |
0.00196\(\approx\) 0.0020 Bonferroni adjusted p-value=0.00385 |
SAS macro |
0.0020 The |
Yes | In R, In the |
|
| Stratified -Max Combo | Chi-square | strata.MaxCombo::SMCtest() |
Z1=3.1813 Z2=3.1813 Z3=3.2738 |
SAS macro |
3.18132 | Yes Yes No |
In R, the test outputs multiple p-values corresponding to different covariance estimators; The SAS macro generates a combination test with a single Z max and p for p-value. The first pval and z.max is closer to those of SAS Macro. |
| p-Value | strata.MaxCombo::SMCtest() |
p1=0.0030 p2=0.0026 p3=0.0021 |
SAS macro |
0.0034 | The closest. slight difference No No |
The SAS macro was modified to accommodate a stratifying variable. The adjustments are documented in the SAS document. |
Summary and Recommendation.
Testing combinations of Weighted Log-rank statistics is a robust alternative to Weighted Log-rank for detecting differences in survival curves in non-proportional hazard situations. However, some authors have expressed caution about the use of the combination test in the sense that one risk is identifying statistically significant results with clinical insignificance; for instance, in cases where treatment is uniformly worse than control, Max Combo can still offer a high chance of rejecting the null hypothesis, favouring treatment. Magirr & Burman developed Modestly weighted Log-rank to counter these issues, especially for a delayed effect case; the weighting is controlled such that the worse treatment effect is not rewarded at an early time point.
The Wilcoxon test reported in SAS documentation corresponds to the Gehan-Breslow test in R. For Peto Peto, Modified Peto, Gehan-Breslow/Wilcoxon and Tarone-Ware test, to ensure reproducibility with SAS procedures, |survimer::surv_pvalue() and a Custom R function can be used. |survimer::surv_pvalue() and custom R compute these test statistics based on the weighting definition of these tests, for example: these functions utilise the size of the risk set and the square root of the risk set, respectively, to compute weights for Gehan-Breslow/Wilcoxon and Tarone-Ware, respectively. The coin::logrank_test() and |survival::survdiff() implement these tests differently, as discussed earlier. coin provides an implementation of a general framework for conditional inference procedures commonly known as permutation tests. |survival::survdiff() in R uses hypergeometric variance formulation to implement Mantel-Cox log-rank test . It uses \(G^\rho\) family of tests .
References.
wlrt()documentation: https://search.r-project.org/CRAN/refmans/nphRCT/html/wlrt.htmlsurvdiff()documentation: https://www.rdocumentation.org/packages/survival/versions/3.8-3/topics/survdiffsurvminer()documentation: https://cran.r-project.org/web/packages/survminer/survminer.pdflogrank.maxtest()documentation: https://search.r-project.org/CRAN/refmans/nph/html/logrank.maxtest.htmlRobust modestly weighted log-rank testsdocumentation:https://arxiv.org/html/2412.14942v1nphRCT packagedocumentation: https://cran.r-project.org/web/packages/nphRCT/nphRCT.pdfLIFETEST proceduredocumentation: https://documentation.sas.com/doc/en/statug/15.2/statug_lifetest_syntax01.htmCombination weighted log-rank testsdocumentation: https://support.sas.com/resources/papers/proceedings20/5062-2020.pdfLIFETEST proceduredocumentation: https://support.sas.com/documentation//cdl/en/statug/68162/HTML/default/viewer.htm#statug_lifetest_details16.htmStratified modestly weighted log-rank testdocumentation: https://cran.r-project.org/web/packages/nphRCT/vignettes/weighted_log_rank_tests.htmlStratified Max-Combodocumentation: https://cran.r-project.org/web/packages/strata.MaxCombo/strata.MaxCombo.pdfMax-combodocumentation: https://search.r-project.org/CRAN/refmans/nph/html/logrank.maxtest.html